home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
WINMX Assorted Textfiles
/
Ebooks.tar
/
Text - Tech - Ebooks - Adams, John F. - How to Scan a Book (TXT).zip
/
John F Adams - How to Scan a Book.txt
Wrap
Text File
|
2001-02-25
|
40KB
|
630 lines
How To Scan A BookHome, Previous Section, Next Section
HOW TO SCAN A BOOK
by John F. Adams
⌐ Proportional Reading 1996
Proportional Reading, P.O. Box 335, Beverly, Mass. 01915 phone (508) 927-9234
CONTENTS
Introductory Notes
Overview of Scanning
How Scanning Books is Different from Other Scanning
Tips on Scanning and Optical Character Recognition
Tips on Editing Text
INTRODUCTORY NOTES
Many people ask, "How do I scan a book?". This article has been written to
answer this question. The truth of the matter is that scanning a book can be
extremely easy if you know what you are doing. Otherwise it will be a nightmare.
Scanning a book is very different from scanning other types of documents. The
tips in this article should be of great help.
This work was written to help people read using the technique called
Proportional Reading. In this approach the eyes never move. You can read up to
700 words per minute and still feel like you are being read aloud to. Text can
also be read out loud in real human voice at normal reading speed as it is
displayed one word at a time. In order to do this type of reading text must
first be in electronic form. The author spent three years developing an
understanding of how to scan books easily so any student could easily scan
course material or other reading material into e-text for Proportional Reading.
The material presented here is essentially chapters 7 and 8 of the Instruction
Manual for Proportional Reading.
Scanning really involves three parts:1) Making a picture of a page (scanning),
2) Using an Optical Character Recognition program to convert the picture into
typed text and 3) Cleaning up the text after this process. In actual practice,
scanning and OCR decisions are made before scanning starts.
Overview of Scanning
A scanner is used to transform a book or article into computerized text, if it
is not already on disk or CD ROM. Scanning text can be done in four ways:
1) from the actual book placed on the scanner bed and scanned one or two pages
at a time.
2) from separated pages of the book placed on the scanner one or two pages at a
time
3) from actual book pages bulk-loaded into the automatic document feeder of the
scanner, or
4) from copies of book pages, which are then either scanned individually or bulk
loaded into the automatic document feeder.
Scanning can be done almost effortlessly if you choose the right approach. This
article will help you understand what this approach should be.
Scanning involves a little bit of learning, but once a book is turned into ascii
text, it can be read by everybody in a school system without any repeating of
these steps. It can be mailed as a diskette or sent by modem, etc.
First, a few words about copyrights. Be sure to get copyright permission first
before any wide dispersal. Proportional Reading was designed to help people read
who would otherwise not be able to benefit from printed text. Publishers almost
universally are very helpful in allowing special treatment of their works for
the learning disabled and physically disabled.
Furthermore, Proportional Reading is designed for average readers to use on
their own reading material which they already have in their possession. This
private, non-profit copying of books is within purchase rights, and it makes
reading possible for many and increases purchase of books.
Most importantly, the basic thrust of Proportional Reading as applied to
scanning books is to return to the original book for the graphics (charts,
illustrations, drawings, graphs, pictures, etc.) and to see the original text
layout. To this end Proportional Reading keys to the original page numbers of
the original text. As a result, actual use of the basic text book increases, not
decreases. This will be especially true as millions of people become able to
read and start to love learning. In all these ways Proportional Reading actually
helps publishers.
Finally, the formatted or Proportionalized version of text requires a special
program to play. So, the formatted text by itself is of little or no use without
both the playing software as well as the original book.
In this article you will learn how to add colored pictures to scanned text.
However, this process adds tremendously to file size and is therefore
impractical except for short articles or articles saved on CD ROM or removable
cartridge. It is usually much easier to refer to the original book for pictures
and other graphics
How Scanning Books is Different from Other Types of Scanning
The best way to learn how to scan a book efficiently is to start by
understanding how scanning a book differs from other types of scanning. There
are eight major differences. We will see that if a book will lie flat on the
scanner bed, you can scan one or both pages of text at a time. Otherwise, it is
easiest by far to separate the pages and scan one side of a page at a time and
OCR the page, spell check the page, and add other special marks before going on
to scan the next page. We will now look at each of the eight major differences
in turn.
1) Page Thickness
Most scanning is designed to be done on standard letter size, 20 lb paper. This
type of medium runs perfectly through the automatic doucment feeder. Other
thicknesses of paper will not work well in the automatic document feeder. The
trouble with books is that many pages are too thick and will not even load into
a document feeder. Most text book pages on the other hand are too thin and will
eventually double up as they enter the document feeder. Either way automatic
processing will jam up. In addition, if you are doing two sided documents, your
collating will be off and all your time will be wasted. In scanning two sided
documents you run through the whole stack one way and then do the whole stack on
the back side and then have the computer collate everything. Any jam up take
will ruin collation and all the investment of time. There is no way to simply
redo collation; it takes place before editing and all offending pages wold have
to be cut and repasted - a nightmere.
For this reason automatic document feeders should not be used with actual book
pages unless pages are copied first onto 20 lb paper with only one side of the
paper used.
2) Rounded Pages
Books may be divided immediately into two types: those that will lie flat and
those that won't. Sometimes you can push down on the spine of the book to make
the text lie flat. If the text won't lie flat it curves into the center and can
not be scanned as is. Many textbooks are designed to make copying impossible by
intentionally making the text flow close to the gutter, or center.
These books can easily be scanned. However, you must first separate the pages.
Be happy about this. Scanning individual pages is much less physical work than
scanning a book. In scanning individual pages there is no lifting and turning
and pressing down on the book. You can sit comfortably in a chair and hardly
move as you scan first one side of a page and then the other side of the same
page and then the next page. Separate the book chapters into different manilla
folders.
A separated book has real value after scanning. It is often much easier to read
a book this way than trying to keep the pages open. Also, bookbags become much
lighter when only the relevant chapters are carried around. The trick is to keep
the different chapters in different folders.
3) See Through
If you want to avoid errors on italics and bold letters you have to use the
highest form of resolution when scanning. This setting also gives you the best
black and white picture quality if you are scanning pictures in the text as
well. The trouble with this setting is that when you scan the average textbook
page of thin shiny paper, the scanner will see right through the page and pick
up details on the back side of the page. There is a simple way to avoid this
problem. This is to put a black sheet behind the page you are copying. The see
through problem will disappear immediately. Unfortunately, the belt on automatic
document feeders is white, not black. Therefore, even if you could get the pages
not to jam up, they will still "bleed" through.
For this reason it is best to tape a black piece of paper on the underside of
the cover of the scanner and scan the pages one page at a time, or scan from an
open book where the pages are automatically backed up. Alternatively you can
make one-sided copies of the text pages and run these copies through the
document feeder. However, this costs a lot of money and requires a good quality
copier. Regardless of how good the copier is, you will loose quality when you
make copies and this will cause errors in scanning. When all is said and done it
is usually best to scan one page at a time, or from an open book that will lie
flat.
4) Text Boxes and Captions
Many books are straight text and these are easy to scan. However, most textbooks
have text boxes on colored backgrounds inserted in the middle of the text. In
addition, graphics of many types with their captions are inserted in the pages.
When text is scanned it ends up in a linear flow. Text boxes and captions can be
very disruptive to reading if they are not moved to the end of the subsection to
which they refer. When text boxes and captions are moved this way they are a joy
to read in a linear flow with the main text.
The best way to do this is to specially mark the text boxes and captions right
after the page is scanned and OCR'd. Here again it is usually best to scan one
separated page at a time, or from an open book that will lie flat.
5) Pictures and Graphics
When you OCR text the OCRing is done in black and white. Although pictures can
be automatically scanned they are not scanned in color and are therefore of
little use in today's world of color. Secondly, when pictures are scanned
through the OCR program, if they have not been carefully defined as pictures,
the text on the pictures is removed and added to the main body of text during
the actual OCR stage. This creates a very confusing piece of text.
The simple solution to this is to select just the sections of text and captions
and text boxes and in the order you want, ignoring the pictures. The way to do
this is to insert one page at a time and manually zone each page. This process
is much faster than deselecting all the zones you do not want and then
reordering the zones you have left from an automatically zoned page.
To readd a picture in color, you first save the text in ascii format and open it
up in your word processor. Then you scan the colored picture using the scanner
alone (not the OCR program) and then copy and paste in the desired picture into
the word processor document at the desired point. Choose "screen" resolution so
the picture file will not be too big.
6) Spell Checking
The best way to make sure the text is free from errors is to scan on the highest
quality mode and to scan directly from the text page. The third thing to do is
to use the spell checking feature on each page of text right after the text has
been scanned and ocr'd. The reason for doing this now is that you can see a
picture of the original scan along with the misspelled word and immediately see
whether the suspicious word is ok or how to fix the error.
7) Page Numbers and Headers
Book pages often have headers and footers on pages. These need to be removed.
The best way to do this is to not select them to be OCR'd in the first place.
When you get the text OCR'd add the page number at the top of the page. This is
very easy to do as the cursor automatically goes to the top of the page as soon
as OCR is done.
8) Titles, Sub-Titles and Key Words
If you mark titles, sub-titles and key words, it is very easy to move to any
place in the e-text document. Furthermore, you can automatically create a five
level outline with key words added in the appropriate sections. No retyping or
handwriting is requirred. Such outlines are tremendous study aids and are
essentially a free by product of scanning. Here again it is best to scan one
page at a time, or from an open book that will lie flat.
Tips on Scanning and OCR'ing Text
Scanning an Open Book
When scanning an open book, you do not want to sit down and stand up repeatedly.
This is very hard on the body. It is much easier to scan first two open pages,
turn the page, then scan the next two open pages etc. After you are done just
scanning, go back with the book and zone and OCR and check each two pages at a
time. Alternatively, you can zone all the pages then OCR the lot, or you can
tell the program to automatically zone and OCR the lot.
Another good trick is to place an open book on the scanner with a weight on top
of it and scan two pages at a time. This way you don't have to personally press
down on the book binding all the time the scanner is working. Use a gallon of
water in a plastic jug for a weight. Build up an area next to the scanner to the
same height as the lid, using telephone books or other books. Now you can just
drag the water on and off the scanner lid (from the top of the pile). No lifting
of the weight is required.
Cutting Out Pages
The way to cut out the pages of a book is to leave the two covers and binding in
place. Set the book on a piece of scrap wood on the corner of a table with the
bottom cover hanging vertically off the scrap wood and edge of the table. This
way there is no chance of cutting the table or cutting off the back cover of the
book. Lay a straight edge in from the binding about 1/4" on the first internal
page and cut along this guide with a sharp knife, making several passes. You
should be able to free up about 50 pages before you need to remove these pages
and reset the straight edge. Cutting out the pages this way leaves a smooth
surface for re-gluing pages with any wood glue.
A book can be cut apart this way in about two minutes. If you don't want to
reglue the pages, reset them in the cover (still completely intact) and add a
rubber band. Frequently it is much easier to read loose pages than bound pages.
Re-gluing pages is very simple. Just add some wood glue to the binding and to
the binding edge of the pages and stick the pages in the binding. Let set
overnight. The new binding will work just as well as before.
Notes: Some pages are printed right to the center "gutter". This makes manually
scanning one or two pages at a time impossible. It is also impossible to copy
such pages. These pages have to be cut out to be scanned. Secondly, tiny
paperback pages are too small to fit in most document feeders. These pages
should be scanned manually, two pages at a time with deferred OCR, or copied
first and then inserted into the automatic document feeder.
However, cutting and then re-gluing is not workable for library books.
Making Copies of Pages
Making copies of pages and then scanning these copies has some drawbacks, but
can be done quickly and effectively if you use the highest quality scanning
approach. Making copies looses much clarity, which leads to increased errors; it
requires an excellent copier; costs money for a copier machine, paper and
tonier; and requires costly wear and tear upkeep on the copier. It also requires
a document feeder and purchasing and transporting lots of paper. If you don't
separate pages before copying, the book must be able to lay flat on the scanning
window and text must not curl in towards the gutter. Copied pages can easily get
out of order and must be checked before scanning to make sure that they are in
order and that extra blank pages have not gotten inserted by mistake. Often
pages just out of the copier must be reordered. Using a copier, the average 250
page book would cost at least $6.00 for copying, before scanning even begins.
You can copy onto either 8 1/2" x 11" paper or 8 1/2" x 14" paper.
However, you can quickly process any book this way, especially if you copy two
pages at a time. You can easily copy 300 pages an hour, two pages at a time.
These pages can be inserted into the document feeder as they come off the
copier. Scanning can occur simultaneously. Putting copies of pages in a document
feeder is a great solution for scanning borrowed books.
The Best Plan
So, what is the solution? The best approach by far is whenever possible to scan
an open book that will lie flat, scanning one or both pages at a time. The next
best approach is to cut the pages away from the binding whenever possible, scan
them, and then reglue them to the binding. The book will work perfectly. The
third best approach is to make single sided copies of either one or two pages at
a time and run the copies through the automatic document feeder.
Note: Some small paperbacks are sometimes printed on very poor quality paper
with too much ink. As a result, letters are badly formed and scanning even at
the best quality level will not be successful. In this situation, the best
approach is to get a library edition of the book to scan. Don't just waste your
time.
Page Orientation and Differentiation
If you are scanning a regular book or a paper back two pages at a time, you will
have the book turned sideways with the lower left corner of the left page in the
upper right corner of the scanner. If you are copying large pages one at a time
or using large paper, you will have the book upside down, but with the tops of
the pages towards the top of the machine. Make sure you tell the scanner program
which way the text is facing: vertical (portrait) or sideways (landscape).
If you are copying two pages at a time, it is important to make sure the scanner
differentiates between the left and right page. Sometimes this can be a problem
if the margins and gutters between pages gets reduced too far. Otherwise, text
from the two pages will merge. It is also important to cut out all the heavy
black areas around the margins and in the gutter. Otherwise, these areas will be
read as characters.
One solution for this problem is to manually zone the image before scanning the
next page.
If you want to do automatic zoning, there is an easy way around these problems.
Mark either side of the copy window half way up its length. Always center the
book gutter on this center line each time you set the book down on the scanner
bed. Then manually zone the scanner for two zones (one for each page), cutting
out the areas of black. Be sure to zone the earlier page first (otherwise, the
second page will always come before the first). Now save the zone template and
call it up for this book. Pages will be automatically separated in scanning and
black areas will be ignored.
Alternatively, you can set the scanner to automatically zone both pages with no
zones. Then after the scanning is finished and before the text recognition
function starts, manually rezone each page. At the same time you can cut out
graphics and headers. You can also make the page number of each page the first
and top item on that page by selecting it first, even if the page number is on
the bottom of the page. The best approach is not to zone the page number and to
type it in later at the top of the page, or ignore it completely and delete it
later.
Note: When you scan original individual pages (cut out from the book binding)
one at a time, either manually or in a document feeder, there is no gutter
problem, nor problem with black areas.
If you are scanning one page at a time you may want to zone, OCR and edit each
page right after it is scanned. This is fine. However, if you are doing two
pages at a time, or if you want to make maximum use of your scanner, and/or if
you wish to have the OCR done automatically while you do something else, you
should scan all the pages first into separate files which can be finished later.
Later you, or somebody else on another machine, zones the pages manually or has
them automatically zoned when OCR is done. Then the pages are OCR'd and then
edited. It's usually best to scan all the pages first.
Lighten-Darken Control on Scanner (Brightness)
If you choose the fastest scanning speed, you will have to set the brightness
level yourself. On the other hand, if you choose the quality scanning speeds,
the scanner will automatically choose the brightness level for you.
If you are setting the brightness level yourself, be sure to scan and check just
one page of text to begin with. It is important to check the scanning as it
occurs. It is very important that the letters not have broken or missing parts.
Cancel the scanning and move the brightness control towards darken if this is
the case. Then rescan the page for a second check.
To do this, make sure the boxes for multiple pages and deferred recognition are
not checked. The box for automatically saving a document should also be
unchecked.
It is also very important that the letters do not run together. If this is
happening, lighten the brightness control. What you are looking for is the point
right between these two problems. Too much correction for one problem causes the
other problem. Actually, the OCR program does not mind if the letters are very
close, but it minds terribly if the letters are not completely formed or parts
of letters are broken.
Don't have letters any thicker than necessary. If you do, open sections in
letters like "a" and "e" will get blocked out. These letters will subsequently
be misread by the character recognition program.
Start off by scanning just a single copy of text (one or two pages on the copy).
Look at the little view window as the scan is progressing. Cancel the scan and
reset the brightness control and re-scan as often as necessary, until you think
you have scanned a single page of text correctly.
Then, when the scanning ends, look at the actual document. Doing this will
uncover many setting errors that would otherwise go unnoticed. If you see on
your scanned document a number of letters which are only part of the full
letters they are supposed to be ("c" instead of "d" for example, "lll" instead
of "M"), then you need to darken the brightness control.
Making this kind of check is the best way to save a lot of wasted time. Now is
the point to take some extra time. Darken or lighten the brightness control and
repeat the process until you have a clean document of text. Now start to scan.
When you have this control adjusted correctly, there will be a minimum of
spelling errors. All your downstream efforts at Proportionalizing and reading
text will be frustrated if you have a lot of unnecessary spelling errors which
you will have to correct or accept.
Remember: The easiest way around this whole chore is to use the slowest speeds
(best quality) of the scanner. In these modes, brightness level is automatically
adjusted. Note: the scanner will be operating as a greyscale scanner.
Don't Retain Graphics
Set the OCR program not to retain graphics. This will save you a lot of later
deleting and it will speed up OCR.
Retain Font and Paragraph Formatting
Set the OCR options to retain font and paragraph formatting. This way the OCR
text will look very much like the original text and you can clearly see
italicized and bolded words. This makes adding special marks to titles and
sub-titles and key words very easy.
Turn On Virtual Memory
If you are scanning more than just 8-10 pages of plain text, you need to turn on
virtual memory. Otherwise, you will quickly run out of ram memory and scanning
will stop. Automatically scanning 100 pages can easily use up 50 megabytes of
memory while text is in process of being scanned and recognized. This is only a
temporary use, unless you save the working Caere document on the hard drive.
After actual text has been created you manually or automatically throw out the
working file. You must remember to do this or your hard drive will quickly fill
up. When you are finished scanning be sure to turn virtual memory off, as it
causes the Proportional Reading program and other programs to run much slower
than normally.
Special Situations
Occasionally the scanner will interpret a big gap between introductory numbers
and related text as two separate columns. This can also happen with dialogue
where each speaker has a name set off by a space. These situations are easy to
correct. Just rezone the text as one unit.
Also, sometimes a list will have several columns which get read as one unit of
text. You may need to rezone the list into two or more columns in proper
sequence. A quick look at how the list has been zoned will tell you if you need
to make a correction. It is easy to delete the current zones on a page and redo
the zones and OCR. It is also easy to delete the current page and re-scan it.
Deferred Recognition
The fastest way to scan is with multiple pages in the document feeder and the
multiple page and the deferred optical character recognition options turned on.
These are two boxes which you check or uncheck before you start to scan. With
both boxes checked the scanner will scan one page after another and defer
character recognition until you are done scanning.
To manually scan one page after another, just press Command+L after you turn
each page.
You will need extra hard disk memory if you are going to use deferred
recognition. You should plan on leaving at least 50 to 100 megs free, depending
on how many pages of text you want to scan at a time before doing the text
recognition. Forty pages of text can easily temporarily use up to 20 megs of
hard disk space as a Caere file. After recognition the resulting text may only
be 200k. All the bit maps with their large memory requirements will have gone
away or are ready for you to delete, depending on which choice you have made.
Saving Scanned Text
Be sure to save the text as ASCI text without hard returns added at the end of
each line.
Other Scanning Tips
In actual practice, you can scan about 20 pages (40 sides) at a time and then
tell the scanner that you are done. The scanner then makes a file for later
recognition. Then you make more files of 40 or so pages each. When you are ready
you can zone each page and save the file. Then you can tell the OCR program to
open up all these deferred files in order and the program will OCR each file in
turn. This process can take place while you are at lunch or sleeping.
For maximum use of the scanner, transfer documents of scanned only pages to
another computer where zoning and OCR and spell checking and final editing will
take place. If you don't have a network, use a removable cartridge hard drive.
Transfer files will be large, but once processed the same cartridge can be
reused over and over. This way one scanner can scan many books each day.
Individual teachers or students can finish the OCR work on their own computers.
Note: Be sure to remove all deferred files from your hard drive after they have
been turned into text. You can choose to do this automatically. Each deferred
file is like a group of pictures, and takes up a tremendous amount of memory on
your hard drive. Left to accumulate, they will quickly eat up all your disk
space.
The Proper Optical Recognition Program
It is important to use a good scanner and Omni Page Professional optical
character recognition program Version 6. This program is simply the best that is
available. It is the only recommended choice.
Why Choose the 4C
The Hewett Packard 4C flatbed, Color Scanner without automatic document feeder
is an ideal machine for scanning books. Other scanners can be used. In fact,
Hewett Packard makes a black and white scanner which also has a document feeder
and sells at half the price of the 4C. Since all optical character recognition
is done in black and white, why use the color scanner? The following points are
offered:
1) The document feeder on the 4C takes pages as small as 5" x 7". The
(greyscale) scanner has a minimum size which is much larger than the 4C. This in
turn means that middle-size paperbacks can not be cut apart and fed
automatically on the greyscale scanner . They must be copied first. The reason
for all this is that pages feed from the side of the machine and from the side
of the paper (longer direction) on the 4C and from the top of the machine and
the top of the paper on the greyscale scanner. A small page which measures too
narrow for top loading, often still has sufficient size for automatic loading if
loaded from the side.
2) Pages are more stable when scanned in the 4C. This is because the paper moves
in the greyscale scanner, while the scanner light moves in the 4C.
3) With the 4C, color pictures from original text can be scanned in and added
after text is recognized and in WordPerfect. Obviously, a greyscale scanner
can't add color.
4) The flatbed on the 4C is much longer than the flatbed on the greyscale
scanner. This means that fairly large books can be laid down on the 4C and
scanned two pages at a time. You simply can not do this on the greyscale scanner
flatbed.
5) Color adds a great deal to almost all presentations. The 4C allows students
to make Proportional Reading articles using their own color pictures or color
pictures downloaded from many other sources besides books.
6) The 4C can be used by other departments than just reading. Therefore, it can
be better justified than the greyscale scanner, as the expense can be amortized
over more people and more departments.
7) The 4C document feeder holds fifty separate pages while the greyscale scanner
only holds twenty. Tending the machine to restock the document feeder can be cut
way down with the 4C.
Tips on Editing
After scanning a book or article it is necessary to do a little editing to
maximize later reading. All of these steps are optional, but you will be very
pleased if you go through these steps. All of these steps can be done very
quickly.
There are two places to do editing. The first editing is done in the Caere
document right after OCR has taken place. The second editing is done in the
saved ascii text which has been reopened in your word processor.
Editing Right after OCR in the Caere Document
The best way to edit pages is to check the pages as Caere documents first.
Always have the original text on a slant board just below the monitor. As you
click on the window to bring up the next page, turn the page of the original
text just below the screen. If you have separated pages this is even easier to
do as the pages lay flat.
Start by adding the page number. As each page comes up you should add a page
number indicator to the top of the page, like "p#" and then the actual page
number. Then press return to put the page number info on its own line. If you
have scanned two pages at once, mark the second page now. If you did not already
cut out headers in the zoning process, cut out the headers now. All this is easy
to do because the cursor automatically goes to the top of each page as it comes
up.
Adding the page number to the top of the page is important to do for many
reasons, one of which is that saved text in ascii format will not be saved as
separate pages and it is otherwise very difficult to know where one page ends
and the next page starts.
After marking the page number, scroll down the text looking for any areas of
colored text. These are areas the OCR program could not read. They need to be
deleted or corrected. Usually they are parts of pictures or misread letters in
bold or italisized sections. Delete or correct these colored areas.
Also check any columns to make sure they have been zoned correctly. If not,
click back on the zone picture and redo all the zones. To do this press
Command+a and then press "return". A window will appear asking you if you really
want to remove all the zones from this page. Say "yes". Now click on the zoning
tool and rezone the page. Then OCR just this page by typing Command+r. While you
are moving your eyes down each page, make sure each paragraph ends as it should.
Sometimes blank lines need to be deleted and separated text stitched together.
If text begins with an indent, occasionally the first or last full line of text
will be at the beginning of the paragraph, instead of at the end. Look for this
and cut and paste any such sections back to their rightful place.
Also, this is a good time to mark titles and subtitles, boxes, captions, and key
words if you wish. It is easy to do this now because bolded words show up
clearly as bolded and paragraph formatting is like the original. You can use the
keyboard and shift key in the regular manner or you can quickly type marking
combinations using the triple letter keystrokes and 555 and 554. If you doing
this in WordPerfect you can use the macro keystrokes listed just before the
triple letter keystrokes. However, these WordPerfect macro keystrokes won't work
in Caere documents. This is why you use the triple letter keystrokes in Caere
documents.
for <:# (indicates a chapter title) Type: Option+a or aaa
for <:= (indicates a primary sub-title) Type: Option+s or sss
for <: (indicates a secondary sub-title) Type: Option+d or ddd
for <:- (indicates a tertiary sub-title) Type: Option+f or fff
for <:> (marks a selected name or word) Type: Option+g or ggg
for <:% (marks a new part of a book) Type: Option+h or hhh
for p# (marks a page number) Type: Option+z or zzz
for << (marks beginning of caption or box of text) Type: Option+Comma or 555
for << (marks end of caption or box of text) Type: Option+Period or 554
If you use the triple letters and 555 and 554 you need to run the change code
program in WordPerfect which will change these keystrokes into the right code.
These triple letter codes and 555 and 554 are usually used on the Caere
documents where macro keystrokes won't work. They save a great deal of time. To
run the change code program in WordPerfect just type: Control+Option+Command+c.
Now save the text as ascii text.
Editing Saved Text in WordPerfect or Another Word Processor
Open the saved text up in WordPerfect or another word processor and spell check
the text. Place the small spelling window at the bottom of the page so you can
see the text as it is found. If you reduce the size of the text, you can easily
see page numbers or either the current or next page on almost every page. This
enables you to follow along in the original text if necessary.
The first time a new name comes up add it to the vocabulary list and the word
won't resurface as needing to be spelled. Many of the remaining spelling errors
will be matters of adding hyphens between words.
Do not worry about paragraph indents. All these indents (if present) are
automatically removed later during Proportionalizing.
The Last Word on a Page
The last word on the page may be broken apart from the first word on the next
page. If so, it will be missing a hyphen. You should add a hyphen to such words.
Alternatively, you can delete the hard return between the two word parts,
thereby knitting the two parts together. Doing this is often a lot more work as
the page number often falls between.
Page Numbers on the Bottom of the Page
Make sure that page numbers are on the top of the page.
Marks for Text Boxes and Captions for Graphics
All of these should be marked with << before and >> afterwards.
Footnotes
Footnotes should either be cut out completely or placed next to their reference
number in the text. You also need to type a period after any footnote number in
the actual text. This way sentences will end properly with a final period. This
problem arises because footnote numbers are added right next to the end of
sentences without a space break. Hence they are read as part of the preceding
word. Adding a final period after the number allows the end of the sentence to
be recognized as such by the PR program.
Next, select and cut footnotes. Either discard them or paste them next to their
reference number in the text, separated by a space or treat them like captions.
Margin Notes
Margin notes should be removed or treated as captions. The easiest thing to do
is to cut them out when you block text.
Math
Math equations need to have the spaces removed between characters. Otherwise,
each number in the equation will appear on a separate line when they are
presented in Proportional Reading.
Furthermore, scanning usually does a terrible job on sub and super scripts as
well as fancy math graphics. If you do not want to rework the math, it may be
easier to just treat math sections like a graph and have the student refer to
the appropriate page in the book. Type in the words "SeePage".
The third and best approach for math equations is to cut them from the text and
re-scan them as a line drawing graphic which you copy and paste into the word
processor text at the right point.
Adding Interactive Pauses
If you want to add pauses to the text to make interactive questions and answers
out of the text as it is read, now is a good time to do this. All you do is to
type a ~ in the sentence where you want a pause to occur. When the text is
Proportionalized, these marks are automatically turned into hidden signals which
the reading programs recognize if you so choose. Otherwise, they will not play
out.
Reversed Titles
Reversed titles, where the letters are white and the background black, will not
scan. You must retype these titles if any.
Saving Prepared Text
It is a very good idea to save text that is all prepared for Proportionalizing.
This is text that can be read as a regular word processing file. Furthermore,
saving text at this point takes up a lot less memory. It actually takes six
times as much storage to save the same amount of text once it has been
Proportionalized.
If you are working with a lot of books which you are not going to use that
often, you may want to save them as text files. Then you can Proportionalize a
whole book overnight as necessary. This means you can save the average book on
just one diskette (1.4 megs.).
Alternatively, about seventy pages of Proportionalized text can be saved on each
diskette (1.4 megs.)
The best approach for a school is to keep all the books in current use on a file
server in Proportional format on locked files. Each student downloads
Proportionalized text as needed from the central memory onto his own, or lab
computer and plays it as he or she wishes, marking the text as desired and
saving selections onto personal files. This way text can also be sent via modem
over the phone lines to students at home. This process can operate automatically
without involving school personnel.
Text Section with Too Many Hard Returns and Tabs
Occasionally, the ocr program will create a short section of text which is all
chopped up. It will have extra tabs and hard returns in it. It almost always
occurs on indented text. This problem is very easy to fix. All you need to do is
to select the section of text and then go up to the Search menu and activate
Find/Change. Pull down the Direction sub menu to "Within Selection" then insert
"hard return" in the find line and click on Change All. Next insert "tab" on the
find line and again click on Change All. Your section of text will be all fixed
up.
Note: Be sure to choose "within selection" or you will cut out all the hard
returns and/or tabs in the piece.
Home, Previous Section, Next Section